Major League Baseball’s 2021 season has been marked by record breaking statistics. Particularly the record low ERA’s, extremely high spin rates, and a record breaking amount of no-hitters a few months into the season. Baseball is a game many consider boring from lack of action… -avg spin rate prior to June 1st was around 2280 -mention the og steroid era -data from start of season to June 1, and June 1 to July 17, from Baseball Savant. -sinkers and curveballs
For a long time MLB teams had what some would call a gentleman’s agreement over utilizing foreign substances. Initially this started as a way for pitchers to better their grip on the ball when they pitched. Yet, now it’s gotten to the point that the foreign substances are so sticky that instead of just helping the pitchers grip the ball, it’s making the ball stick to their hand for longer, thus increasing the spin rate of the ball and making it harder to hit. As the spin of the ball increases the general location in which the ball is getting thrown is higher, adding to the difficulties hitters are facing. The spin on pitches like the 4-seam fastballs is a back spin, which opposes the downward force of gravity on the ball (think of the term ‘rising fastball’).
The MLB made a decision coming out against foreign substances by stating that starting on June 1st umpires will begin checking for pitchers using foreign substances and on June 16th announced players caught using them will be removed from the game, placed on a 10 game suspension, and fined. Many people have argued that these statements won’t deter pitchers from utilizing foreign substances. Yet many analysts have been saying spin rates going down and batting averages going up. In this notebook we will be examining the affect of the MLB’s new implementation and examining changes with individual players.
All the data included has been provided by Baseball Savant found through utilizing the search feature.
It’s important to note that generally as increases velocity so does spin rate. In this section we will be comparing pitchers spin rates vs velocities before and after June 1st.
library(ggplot2) ## loading packages
library(ggExtra)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──
✓ tibble 3.0.6 ✓ purrr 0.3.4
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
b4 <-
read.csv('b4June1.csv')
b4
b4gr <-
b4 %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
ggtitle('Spin Rate VS Velocity Prior to June 1st') +
labs(color = 'Total Pitches')
##using ggExtra package to add histograms to show the distribution of data within the scatterplot
p1 <-
ggMarginal(b4gr, type = 'histogram')
p1
This data shows that generally pitchers average spin rates are concentrated between 2200 rpm and 2600 rpm and the velocities range from the upper 80s to mid 90s.
aft <-
read.csv('aftJune1st.csv')
aft
aftgr <-
aft %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
ggtitle('Spin Rate VS Velocity After June 1st') +
labs(color = 'Total Pitches')
p2 <-
ggMarginal(aftgr, type = 'histogram')
p2
p1
Based on this graph of the data following June 1st, there doesn’t seem to be much a difference. With velocities concentrated between the upper 80’s and lower 90’s and spin rates between 2000 rpm and 2500 rpm. Upon further inspection of the scatterplot it appears that most of the outliers are darker shaded dots representing pitcher who’ve thrown between 1 and 250 pitches. Because this data is based on averages and smaller sample sizes are generally less representative of the individual. This prompted me to see if there was a difference when if I increased the minimum number of pitches to at least 250.
b4abv <-
b4%>%
filter(total_pitches > 249)
b4abv
aftabv <-
aft%>%
filter(total_pitches > 249)
aftabv
babvgr <-
b4abv %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
labs(color = 'Total Pitches',
title = 'Spin Rate vs Velocity Before June 1st',
caption = 'For pitchers with over 250 pitches') +
scale_color_viridis_c()
afabvgr <-
aftabv %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
labs(color = 'Total Pitches',
title = 'Spin Rate vs Velocity After June 1st',
caption = 'For pitchers with over 250 pitches') +
scale_color_viridis_c()
k1 <-
ggMarginal(afabvgr, type = 'histogram')
k1
g1 <-
ggMarginal(babvgr, type = 'histogram')
g1
Here we can see a more notable difference between average spin rates and velocities. Prior to June 1st we can see that average velocities were generally between the upper 80s and mid 90s, with spin rates ranging between 2100 and 2600 rpm. The extremes included 5 pitchers with spin rates above 2750 rpm and 20 pitchers above 95 mph. In contrast we can see that after June 1st, velocities are concentrated between the mid 80s to lower 90’s and spin rates between 2000 and 2500 rpm. The extremes here now only include 1 pitcher with an average spin rate above 2750 and 3 pitchers above 95 mph. Additionally the frequency distribution of the velocities changed from left skew to a normal distribution. Whereas the frequency distribution of the spin rates went from bi-modal to a normal distribution following June 1st.
In this section we will be examining a change in performance (if any) between specific pitchers, including Trevor Bauer, Gerrit Cole, and Garrett Richards.
It’s hard to have any conversation about foreign substances in baseball without mentioning Trevor Bauer. From being someone who initially spoken against foreign substances and once even stating that they could be more powerful than steroids, to now being accused of having the most effective sticky substance combination in the league, Bauer’s name has been brought up a lot. Considering recent events regarding Bauer’s arrest and allegation, the data available is more limited than with other players.
bauer <- ## downloaded from Baseball Savant
read.csv('bauer advanced stats.csv')
bauer
num <-
bauer %>%
group_by(pitch_name) %>%
summarise(count = n())
num
From this data we can see that during the month of June, Bauer’s most utilized pitches were the fastball, cutter, and slider.
tbju28 <-
bauer %>%
filter(game_date == '2021-06-28')
tbju28
tb2avgs <- ## finding game averages from the June 28th game
tbju28 %>%
select(pitch_name, release_speed, release_spin_rate) %>%
group_by(pitch_name) %>%
summarise(avgspeed = mean(release_speed, na.rm = TRUE), avgspin = mean(release_spin_rate, na.rm = TRUE), count = n()) %>%
arrange(pitch_name)
tb2avgs
From this we can see that Bauer’s fastest average pitches are his sinker, fastball, and changeup. His pitches with the highest spin rates are the knuckle curve, slider, cutter, and fastball, all with spin rates all well above the average spin rate of 2280 rpm.
Bauer’s most utilized pitches this game:
Cutter (medium velocity & high spin rate)
Slider (low velocity & high spin rate)
Fastball (high velocity & spin rate)
tb28gr <- ## graph depicting Bauer's spin rates by pitch
tbju28%>%
ggplot(aes(x = at_bat_number , y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity')+
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Trevor Bauer's Spin Rate by Pitch Type",
color = 'Pitch Name',
caption = 'From his last game on June 28th')
tb6avgs <-
tbju6%>%
select(pitch_name, release_speed, release_spin_rate) %>%
group_by(pitch_name) %>%
summarise(avgspeed = mean(release_speed, na.rm = TRUE), avgspin = mean(release_spin_rate, na.rm = TRUE), count = n()) %>%
arrange(pitch_name)
tb6avgs
Here we can see that from Bauer’s June 6th game, the pitches with the highest speed were the fastball, sinker, and changeup. And his pitches with the highest spin rates were the knuckle curveball, slider, cutter, sinker, and fastball, all significantly above league average of 2280 rpm.
Bauer’s most utilized pitches:
tb6gr <- ## depicting Bauer's spin rates by pitch from June 6th
tbju6%>%
ggplot(aes(x = at_bat_number , y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity')+
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Trevor Bauer's Spin Rate by Pitch Type",
color = 'Pitch Name',
caption = 'From game pitched on June 6th')
tb6gr
tb28gr <- ## depicting Bauer's spin rates by pitch from June 28th
tbju28%>%
ggplot(aes(x = at_bat_number , y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity')+
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Trevor Bauer's Spin Rate by Pitch Type",
color = 'Pitch Name',
caption = 'From his last game on June 28th')
tb28gr
## pitch spin rt vs velocity
tbsvvg <-
tbju6%>%
ggplot(aes(x = release_speed, y = release_spin_rate, color = pitch_name)) +
geom_point(sstat = 'identity')
Ignoring unknown parameters: sstat
tbsvvg
tbs2 <-
tbju28%>%
ggplot(aes(x = release_speed, y = release_spin_rate, color = pitch_name)) +
geom_point(sstat = 'identity')
Ignoring unknown parameters: sstat
tbs2
gcoledt <-
read.csv('Gerrit Cole advanced stats.csv')
gcoledt
gcju3 <-
gcoledt %>%
filter(game_date == '2021-06-03')
gcju3
gcjl10 <-
gcoledt %>%
filter(game_date == '2021-07-10')
gcjl10
gcju3gr <-
gcju3 %>%
ggplot(aes(x = at_bat_number, y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity') +
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Gerrit Cole's Spin Rate by Pitch",
caption = 'From the game pitched on June 3rd',
color = 'Pitch Type')
gcju3gr
gcjl10gr <-
gcjl10 %>%
ggplot(aes(x = at_bat_number, y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity') +
xlab('Number At Bat') +
ylab('Release Spin Rate') +
labs(title = "Gerrit Cole's Spin Rate by Pitch Type",
caption = 'From the game Cole pitched on July 10th',
color = 'Pitch Type')
gcjl10gr
gcju3gr
grichdt <- ## Garrett Richards data from Baseball Savant
read.csv('Garrett Richards advanced stats.csv')
grichdt
grjul9 <- ## filtering the data from Garrett Richard's most recent game
grichdt %>%
filter(game_date == '2021-07-09')
grjul9
grjul9gr <-
grjul9 %>%
ggplot(aes(x = pitch_number, y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity') +
geom_point() +
#geom_smooth() +
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = 'Gerritt Richards Spin Rate by Pitch',
caption = 'For game pitched on July 7th',
color = 'Pitch Type')
grjul9gr